26 research outputs found
Using the Output Embedding to Improve Language Models
We study the topmost weight matrix of neural network language models. We show
that this matrix constitutes a valid word embedding. When training language
models, we recommend tying the input embedding and this output embedding. We
analyze the resulting update rules and show that the tied embedding evolves in
a more similar way to the output embedding than to the input embedding in the
untied model. We also offer a new method of regularizing the output embedding.
Our methods lead to a significant reduction in perplexity, as we are able to
show on a variety of neural network language models. Finally, we show that
weight tying can reduce the size of neural translation models to less than half
of their original size without harming their performance.Comment: To appear in EACL 201
How Language Model Hallucinations Can Snowball
A major risk of using language models in practical applications is their
tendency to hallucinate incorrect statements. Hallucinations are often
attributed to knowledge gaps in LMs, but we hypothesize that in some cases,
when justifying previously generated hallucinations, LMs output false claims
that they can separately recognize as incorrect. We construct three
question-answering datasets where ChatGPT and GPT-4 often state an incorrect
answer and offer an explanation with at least one incorrect claim. Crucially,
we find that ChatGPT and GPT-4 can identify 67% and 87% of their own mistakes,
respectively. We refer to this phenomenon as hallucination snowballing: an LM
over-commits to early mistakes, leading to more mistakes that it otherwise
would not make
Measuring and Narrowing the Compositionality Gap in Language Models
We investigate the ability of language models to perform compositional
reasoning tasks where the overall solution depends on correctly composing the
answers to sub-problems. We measure how often models can correctly answer all
sub-problems but not generate the overall solution, a ratio we call the
compositionality gap. We evaluate this ratio by asking multi-hop questions with
answers that require composing multiple facts unlikely to have been observed
together during pretraining. In the GPT-3 family of models, as model size
increases we show that the single-hop question answering performance improves
faster than the multi-hop performance does, therefore the compositionality gap
does not decrease. This surprising result suggests that while more powerful
models memorize and recall more factual knowledge, they show no corresponding
improvement in their ability to perform this kind of compositional reasoning.
We then demonstrate how elicitive prompting (such as chain of thought)
narrows the compositionality gap by reasoning explicitly instead of implicitly.
We present a new method, self-ask, that further improves on chain of thought.
In our method, the model explicitly asks itself (and then answers) follow-up
questions before answering the initial question. We finally show that
self-ask's structured prompting lets us easily plug in a search engine to
answer the follow-up questions, which additionally improves accuracy
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Language models have outpaced our ability to evaluate them effectively, but
for their future development it is essential to study the frontier of their
capabilities. We consider real-world software engineering to be a rich,
sustainable, and challenging testbed for evaluating the next generation of
language models. We therefore introduce SWE-bench, an evaluation framework
including software engineering problems drawn from real GitHub issues
and corresponding pull requests across popular Python repositories. Given
a codebase along with a description of an issue to be resolved, a language
model is tasked with editing the codebase to address the issue. Resolving
issues in SWE-bench frequently requires understanding and coordinating changes
across multiple functions, classes, and even files simultaneously, calling for
models to interact with execution environments, process extremely long contexts
and perform complex reasoning that goes far beyond traditional code generation.
Our evaluations show that both state-of-the-art proprietary models and our
fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and
GPT-4 solve a mere % and % of instances respectively, even when
provided with an oracle retriever. Advances on SWE-bench represent steps
towards LMs that are more practical, intelligent, and autonomous.Comment: Data, code, and leaderboard are available at https://www.swebench.co
Architecture of Planetary Systems Based on Kepler Data: Number of Planets and Coplanarity
We investigated the underlying architecture of planetary systems by deriving
the distribution of planet multiplicity (number of planets) and the
distribution of orbital inclinations based on the sample of planet candidates
discovered by the Kepler mission. The scope of our study included solar-like
stars and planets with orbital periods less than 200 days and with radii
between 1.5 and 30 Earth radii, and was based on Kepler planet candidates
detected during Quarters 1 through 6. We created models of planetary systems
with different distributions of planet multiplicity and inclinations, simulated
observations of these systems by Kepler, and compared the properties of the
transits of detectable objects to actual Kepler planet detections.
Specifically, we compared with both the Kepler sample's transit numbers and
normalized transit duration ratios in order to determine each model's
goodness-of-fit. We did not include any constraints from radial velocity
surveys. Based on our best-fit models, 75-80% of planetary systems have 1 or 2
planets with orbital periods less than 200 days. In addition, over 85% of
planets have orbital inclinations less than 3 degrees (relative to a common
reference plane). This high degree of coplanarity is comparable to that seen in
our Solar System. These results have implications for planet formation and
evolution theories. Low inclinations are consistent with planets forming in a
protoplanetary disk, followed by evolution without significant and lasting
perturbations from other bodies capable of increasing inclinations.Comment: 16 pages, 7 figures, accepted to Ap
The multi-configurational time-dependent Hartree method for bosons: Many-body dynamics of bosonic systems
The evolution of Bose-Einstein condensates is amply described by the
time-dependent Gross-Pitaevskii mean-field theory which assumes all bosons to
reside in a single time-dependent one-particle state throughout the propagation
process. In this work, we go beyond mean-field and develop an essentially-exact
many-body theory for the propagation of the time-dependent Schr\"odinger
equation of interacting identical bosons. In our theory, the time-dependent
many-boson wavefunction is written as a sum of permanents assembled from
orthogonal one-particle functions, or orbitals, where {\it both} the expansion
coefficients {\it and} the permanents (orbitals) themselves are {\it
time-dependent} and fully determined according to a standard time-dependent
variational principle. By employing either the usual Lagrangian formulation or
the Dirac-Frenkel variational principle we arrive at two sets of coupled
equations-of-motion, one for the orbitals and one for the expansion
coefficients. The first set comprises of first-order differential equations in
time and non-linear integro-differential equations in position space, whereas
the second set consists of first-order differential equations with
time-dependent coefficients. We call our theory multi-configurational
time-dependent Hartree for bosons, or MCTDHB(), where specifies the
number of time-dependent orbitals used to construct the permanents. Numerical
implementation of the theory is reported and illustrative numerical examples of
many-body dynamics of trapped Bose-Einstein condensates are provided and
discussed.Comment: 30 pages, 2 figure
Time-dependent multi-orbital mean-field for fragmented Bose-Einstein condensates
The evolution of Bose-Einstein condensates is usually described by the famous
time-dependent Gross-Pitaevskii equation, which assumes all bosons to reside in
a single time-dependent orbital. In the present work we address the evolution
of fragmented condensates, for which two (or more) orbitals are occupied, and
derive a corresponding time-dependent multi-orbital mean-field theory. We call
our theory TDMF(), where stands for the number of evolving fragments.
Working equations for a general two-body interaction between the bosons are
explicitly presented along with an illustrative numerical example.Comment: 16 pages, 1 figur